TIRE-seq T cell human clustering

The aim of this notebook is to cluster the samples.

Recap

I sequenced these samples with Prime-seq on extracted RNAs already. Results were not that great. Here reprocess the same samples with TIRE-seq for a head to head comparison.

Samples

  • Yasmin Nouri plates.
  • All wells have 10,000 cells in 50ul total volume.
  • Comprised of 25ul cells in media and 25ul 2x buffer TCL
  • The cell concentration is therefore 200 cells/uL.
  • So 20uL input will be 4000 cells.

Lab notes

The processing went as planned. Full writeup available at ELN link

Read data

This was generated in 1A_generateSCE-human

sce <- readRDS(here::here(
   "data/TIRE_Tcell/SCEs/tcell_basic.sce.rds"
))
sce_orig <- sce

tb <- as_tibble(colData(sce))

Recap quality control

This is explored more fully in notebook 1B. Here I recap only the important point.
Row H was largely a failure. This is donor 64, CD8 T cells.

plt1 <- ggplot(tb,
             aes(x = Timepoint, y= sum, colour = Donor, label=Well)) + 
  geom_text() +
  ylab("UMIs") + 
  xlab("Plate") +
  scale_y_continuous(trans='log10') +
  annotation_logticks(base = 10, sides = "l") +
  scale_colour_brewer(palette = "Dark2")

plt1

Remove outliers. 13 samples are removed.
I won’t use mitochondrial genes as a criteria here because the general increase in transcription upon stimulation.

discarded <- perCellQCFilters(sce)

colSums(as.data.frame(discarded))
##   low_lib_size low_n_features        discard 
##             11             13             13
sce <- sce_orig[,!discarded$discard]
discarded <- as_tibble(discarded)

# Summarise how many cells left
cell_drop_tb <- rbind(
  cbind(length(colnames(sce_orig)), length(colnames(sce)))
)

colnames(cell_drop_tb) <- c("Before_Filter", "After_filter")
cell_drop_tb
##      Before_Filter After_filter
## [1,]            96           83

Reads_UMIs after filter

tb <- as_tibble(colData(sce))

plt8 <- ggplot(data=tb, aes(y=sum+1, x=detected, colour=Timepoint, label=Well)) +
  geom_text(size=3) +
  scale_colour_brewer(type="qualitative", palette = "Dark2") +
  xlab("Genes") + ylab("UMIs") +
  scale_y_continuous(trans='log10') +
  annotation_logticks(base = 10, sides = "l")

plt8

View highly expressed genes

  • TMSB10 seems like an non-specific immune gene
  • The other genes are pretty typical of RNA-Seq as highly expressed genes.
plt1 <- plotHighestExprs(sce, n=10, colour_cells_by = "Timepoint") +
  theme_Publication()

plt1

Library size normalization and transformation

set.seed(666)

lib.sf <- librarySizeFactors(sce)
sce <- computeSumFactors(sce)
sce <- logNormCounts(sce)

Feature selection

Select the top 1000 most variable genes.

dec.sce <- modelGeneVar(sce)
fit.sce <- metadata(dec.sce)
hvg.sce.var <- getTopHVGs(dec.sce, n=1000)
sce <- runPCA(sce, subset_row=hvg.sce.var, ncomponents=30)

Visulaise the fit

tb <- as_tibble(cbind(fit.sce$mean, fit.sce$var))

colnames(tb) <- c("Mean", "Variance")

Visualise mean variance

Technical variation is nice and low. Some clear genes above technical variation

plt2 <- ggplot(tb, 
             aes(x = Mean, y= Variance)) + 
  geom_point(alpha = 0.2, size=0.5) + 
  guides(colour = guide_legend(override.aes = list(size=2, alpha=1))) +
  xlab("Mean of log-expression") + 
  ylab("Variance of log-expression") +
  scale_colour_brewer(type="qualitative", palette = "Dark2") +
  geom_function(fun = fit.sce$trend, colour = "darkgreen") +
  theme_Publication(base_size = 16)

plt2
The coloured line represents technical variation

The coloured line represents technical variation

PCA plots

Principal component analysis is a linear dimension reduction so the distances are interpretable.
It preserves local differences so not as good at visualising non-linear differences.

A PCA 1 and 2 of 38% and 16% is good for RNA-seq data.

# Convert to ordered factor with numeric day values
sce$Timepoint <- factor(sce$Timepoint,
                       levels = str_sort(unique(sce$Timepoint), numeric = TRUE),
                       ordered = TRUE)

Immune subset

  • Seems to be the major difference revealed by the PCA.
  • PCA 1 is the timepoint
  • PCA2 is less clear
  • Day 0 which is when the cells are thawed out

A clear difference in CD4 vs CD8 that is greater than the donor effect.

plt5 <- plotPCA(sce, colour_by="Timepoint", shape_by="Subset", point_size=2) + 
  guides(colour=guide_legend(ncol=2)) +
  theme_Publication(base_size=16)
plt5

Donor

plt4 <- plotPCA(sce, colour_by="Timepoint", shape_by="Donor", point_size=2) + 
  guides(colour=guide_legend(ncol=2)) +
  theme_Publication(base_size=16)

plt4

Genes detected

At timepoint 2 days post activation have the most detected genes.
This makes sense that the cells are most supercharged here.

plt6 <- plotPCA(sce, colour_by="detected", shape_by="Receptor") + theme_Publication(base_size=18)
plt6

Variance explained

Keep 14 PCs to focus on the biological variation.

# Percentage of variance explained is tucked away in the attributes.
percent.var <- attr(reducedDim(sce), "percentVar")

plot(percent.var, xlab="PC", ylab="Variance explained (%)")

UMAP

UMAP is a non-linear dimension reduction and emphasises global structure. The distances between groups don’t mean anything. You can say something about how tightly a group packs into a cluster.

set.seed(100)
sce <- runUMAP(sce, dimred = "PCA", n_dimred=14)

Timepoint

Very curious that day 1 and day 2 has the most dramatic differences and day 0 does not.

In the PCA plots day 0 is more distinct.

plt10 <- plotReducedDim(sce, dimred="UMAP", colour_by="Timepoint",shape_by="Subset", point_size=2) +
  guides(colour=guide_legend(ncol=2)) +
  theme_Publication(base_size = 16)

plt10

Save data

saveRDS(sce, here::here(
  "data/TIRE_Tcell/SCEs/tcell_cluster.sce.rds"
))

Conclusion

  • Timepoint after stimulation is the major difference.
  • Minor differences with immune subset

Next steps

Differential expression testing.

Session info

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Red Hat Enterprise Linux 9.4 (Plow)
## 
## Matrix products: default
## BLAS:   /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRblas.so 
## LAPACK: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRlapack.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggthemes_5.1.0              here_1.0.1                 
##  [3] knitr_1.48                  patchwork_1.3.0            
##  [5] platetools_0.1.7            scater_1.32.1              
##  [7] scran_1.32.0                scuttle_1.14.0             
##  [9] lubridate_1.9.3             forcats_1.0.0              
## [11] stringr_1.5.1               dplyr_1.1.4                
## [13] purrr_1.0.2                 readr_2.1.5                
## [15] tidyr_1.3.1                 tibble_3.2.1               
## [17] ggplot2_3.5.1               tidyverse_2.0.0            
## [19] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
## [21] Biobase_2.64.0              GenomicRanges_1.56.2       
## [23] GenomeInfoDb_1.40.1         IRanges_2.38.1             
## [25] S4Vectors_0.42.1            BiocGenerics_0.50.0        
## [27] MatrixGenerics_1.16.0       matrixStats_1.4.1          
## 
## loaded via a namespace (and not attached):
##  [1] gridExtra_2.3             rlang_1.1.4              
##  [3] magrittr_2.0.3            compiler_4.4.1           
##  [5] DelayedMatrixStats_1.26.0 vctrs_0.6.5              
##  [7] pkgconfig_2.0.3           crayon_1.5.3             
##  [9] fastmap_1.2.0             XVector_0.44.0           
## [11] labeling_0.4.3            utf8_1.2.4               
## [13] rmarkdown_2.28            tzdb_0.4.0               
## [15] UCSC.utils_1.0.0          ggbeeswarm_0.7.2         
## [17] xfun_0.48                 bluster_1.14.0           
## [19] zlibbioc_1.50.0           cachem_1.1.0             
## [21] beachmat_2.20.0           jsonlite_1.8.9           
## [23] highr_0.11                DelayedArray_0.30.1      
## [25] BiocParallel_1.38.0       irlba_2.3.5.1            
## [27] parallel_4.4.1            cluster_2.1.6            
## [29] R6_2.5.1                  RColorBrewer_1.1-3       
## [31] bslib_0.8.0               stringi_1.8.4            
## [33] limma_3.60.6              jquerylib_0.1.4          
## [35] Rcpp_1.0.13               FNN_1.1.4.1              
## [37] Matrix_1.7-0              igraph_2.0.3             
## [39] timechange_0.3.0          tidyselect_1.2.1         
## [41] rstudioapi_0.17.0         abind_1.4-8              
## [43] yaml_2.3.10               viridis_0.6.5            
## [45] codetools_0.2-20          lattice_0.22-6           
## [47] withr_3.0.1               evaluate_1.0.1           
## [49] pillar_1.9.0              generics_0.1.3           
## [51] rprojroot_2.0.4           hms_1.1.3                
## [53] sparseMatrixStats_1.16.0  munsell_0.5.1            
## [55] scales_1.3.0              glue_1.8.0               
## [57] metapod_1.12.0            tools_4.4.1              
## [59] BiocNeighbors_1.22.0      ScaledMatrix_1.12.0      
## [61] locfit_1.5-9.10           cowplot_1.1.3            
## [63] edgeR_4.2.2               colorspace_2.1-1         
## [65] GenomeInfoDbData_1.2.12   beeswarm_0.4.0           
## [67] BiocSingular_1.20.0       vipor_0.4.7              
## [69] cli_3.6.3                 rsvd_1.0.5               
## [71] fansi_1.0.6               S4Arrays_1.4.1           
## [73] viridisLite_0.4.2         uwot_0.2.2               
## [75] gtable_0.3.5              sass_0.4.9               
## [77] digest_0.6.37             SparseArray_1.4.8        
## [79] ggrepel_0.9.6             dqrng_0.4.1              
## [81] farver_2.1.2              htmltools_0.5.8.1        
## [83] lifecycle_1.0.4           httr_1.4.7               
## [85] statmod_1.5.0